o 

(N 



(N 



o^: 



> 



Restricted trees: simplifying networks with 

bottlenecks 

Stephen J. Willson 

Department of Mathematics 

Iowa State University 

Ames, lA 50011 USA 

swillson@iastate.edu 



■ ■ May 28, 2010 

Oh 

O . Abstract. Suppose A^ is a phylogenetic network indicating a complicated 

r^ ' relationship among individuals and taxa. Often of interest is a much simpler 

network, for example, a species tree T, that summarizes the most fundamental 
relationships. The meaning of a species tree is made more complicated by the 
recent discovery of the importance of hybridizations and lateral gene transfers. 
Hence it is desirable to describe uniform well-defined procedures that yield a 
tree given a network N. 

A useful tool toward this end is a connected surjective digraph (CSD) map 
Q^ , (p : N ^ N' where N' is generally a much simpler network than N. A set W 

Tij" ' of vertices in N is "restricted" if there is at most one vertex from which there 

iy-\ , is an arc into W, thus yielding a bottleneck in N. A CSD map <j) : N ^ N' is 

(^ ' "restricted" if the inverse image of each vertex in N' is restricted in N. This 

^^ , paper describes a uniform procedure that, given a network N, yields a well- 

defined tree called the "restricted tree" of N. There is a restricted CSD map 
from N to the restricted tree. Many relationships in the tree can be proved to 
(^ ■ appear also in TV. 

jrt ' Key words: digraph, network, tree, connected, hybrid, phylogeny, homomor- 

phism, restricted, phylogenetic network 

1 Introduction 

Since Darwin, phylogenetic trees have been utilized to display the evolutionary 
relationships among taxa. Extant taxa correspond to the leaves of the trees. In 
principle, the trees are directed in the direction of increasing time, and there is 
a single root indicating the common ancestry of all the taxa in question. 

The underlying reality is often a much more complicated network than a 
tree. If every vertex corresponds to an individual and the species are sexu- 



ally reproducing, then the underlying graph has vast numbers of vertices, each 
with indegree 2. This underlying reality is too complicated to reconstruct. The 
species phylogenetic tree is a dramatic simplification which summarizes the un- 
derlying reality. 

More recently, events such as hybridization and lateral gene transfer have 
been shown to have increased importance [9], [6]. Such possibilities have called 
into question the adequacy of a phylogenetic species tree as a tool. Coalescence 
methods [20], [8] have modeled relationships between species trees and gene 
trees making use of a presumed network of the underlying reality. Moreover, 
specific biological networks have been proposed for certain systems [6], [15]. 

Once we start to consider networks more general than trees, we must be 
concerned about the assumptions that can be made about these networks. There 
are astronomically more networks than even the large number of trees with a 
given leaf set. Hence it becomes important to narrow the collection in a useful 
manner. General frameworks for networks are discussed in [1], [2], [16], [17], and 
[18]. Typically these frameworks model phylogenies by acyclic rooted directed 
graphs. 

Particular kinds of networks have been studied in various papers. Wang et 
al. [22] and Gusfield et al. [11] study "galled trees" in which all recombination 
events are associated with node-disjoint recombination cycles. Van lersel and 
others [14] generalized galled trees to "level-fc" networks. Baroni, Semple, and 
Steel [2] introduced the idea of a "regular" network, which coincides with its 
cover digraph. Cardona et al. [5] discussed "tree-child" networks, in which 
every vertex not a leaf has a child that is not a reticulation vertex. Moret et al. 
[16] define a reduction R{N) of a network N of use in analyzing displayed trees. 

The possibilities of very complicated networks raise anew the question of the 
relationship between the hugely complex underlying reality and the phylogenetic 
trees and networks which simplify and summarize possible relationships. 

Dress et al. [10] give several abstract constructions of manners in which a 
very general network can give rise to trees, or, more generally, hierarchies. For 
example, they define notions of tight clusters and strict clusters and show that 
these produce trees or hierarchies. Both notions identify a kind of bottleneck in 
the underlying network and produce trees. 

In [23] the current author described a general approach giving relationships 
between a complicated underlying network TV and a much simpler network N' . 
For example, N might be the largely unknown directed graph showing the un- 
derlying reality while N' might be the species tree. The basic tool is a connected 
surjective digraph map or, more briefly, a CSD map from N to N' . The idea 
is that every vertex u of A^ is taken to a vertex (f>{v) of N' in such a manner 
that the following hold: 

(a) If (m, v) is an arc of N, then cither (f){u) = 0(w) or else {(piu), 0(w)) is an arc 
of A^'. 

(b) The map is surjective both on vertices of N' and on arcs of N' . 

(c) For each vertex v' of N' , the set of vertices of N mapping to v' forms a 
connected set. 

Details are given in section 2. 



Many properties of CSD maps are given in [23] . While (a) is very similar to 
the notion of a homomorphism of digraphs [12], [13], the essential new condition 
is (c). Without (c), knowledge of N' gives very little information about N; the 
notion without (c) is too general. With (c), the notion is much more rigid, and 
information about N' implies structure in TV. For example, if N' is a binary tree 
and (j) : N ^i- N' is a CSD map, then there is a wired lift of A^' into N, showing 
that as an undirected network N' embeds in A'^. If, instead, (j) : N ^ N' satisfied 
merely (a) and (b), then when N' is a binary tree, N could still be trivial or a 
star tree. Further details are given in section 2 and [23]. 

The cluster of a vertex u in a network A'^ is the set of leaves which can be 
reached by directed paths starting at v. A network A'^ is successively cluster- 
distinct if whenever (u, v) is an arc, then u and v have distinct clusters. In [23] 

1 gave a construction, given any network A'^, of a successively cluster-distinct 
network ClDis{N). I showed that there is a CSD map <{> : N ^ ClDis{N), 
and moreover that (p had a certain "universal" property. I argued that it was 
therefore reasonable to restrict one's attention to networks that were successively 
cluster-distinct. 

In this paper I elaborate further. Given a network A^, I describe a general 
method to construct a restricted tree denoted ResTr(N). In some ways the 
procedure resembles that given in [10] of tight clusters in that it detects bot- 
tlenecks of a certain sort. The construction differs, however, in that it always 
yields a CSD map (j> : N ^ ResTr{N); the construction in [10] may not have 
this property. 

The computation of ResTr(N) will typically have more resolution when it 
is applied to a network A^ that is already successively cluster-distinct. 

The heart of the construction is the notion of a restricted set B, given in 
section 3. Such a set S is a set of vertices in A^ such that there is at most one 
vertex u for which there is any arc (u, w) with u ^ W but w G W. Such a 
vertex identifies a bottleneck in the network A^. It is shown in Section 3 how 
to construct the smallest restricted set R{v) containing a given vertex v. These 
sets are utilized to construct ResTr{N). 

Section 4 focuses on properties of restricted CSD maps — those CSD maps for 
which the inverse images of each point is a restricted set. It is shown that any 
such map defined on A'^ factors through ResTr{N), making ResTr{N) "univer- 
sal" for such maps. Thus ResTr{N) not only permits wired lifts into the network 
A^, but any restricted map factors through ResTr{N). Hence ResTr{N) is an 
invariantly defined tree with interesting universal properties. 

Section 4 also contains an example of the construction of ResTr(N). 

2 Fundamental Concepts 

A directed graph or digraph N — {V, A) consists of a finite set V of vertices and 
a finite set A of arcs, each consisting of an ordered pair (u, v) where u E V, 
V G V , u ^ V. Sometimes we write V{N) for V. We interpret {u,v) as an 
arrow from u to w and say that the arc starts at u and ends at v. There are no 



multiple arcs and no loops. If (u, v) € A, say that u is a parent of v and u is a 
child of u. A directed path is a sequence uo,ui, ■ ■ ■ ,Uk of vertices such that for 
i = 1, • • • , fc, (wi-i, Ui) e A. The path is trivial if fc — 0. Write u < v ii there is 
a directed path starting at u and ending at v. Write u < v ii u < v and u ^ v. 
The digraph is acyclic if there is no nontrivial directed path starting and ending 
at the same point. If the digraph is acyclic, it is easy to see that < is a partial 
order on V. 

The digraph {V, A) has root r if there exists r <E V such that for all v <E V, 
r < V. The graph is rooted if it has a root. 

The indegree of vertex u is the number oi v G V such that (v, u) G A. The 
outdegree of u is the number of v €V such that (u, v) G A. If (V, A) is rooted 
at r then r is the only vertex of indegree 0. A leaf is a vertex of outdegree 0. 
A normal (or tree-child) vertex is a vertex of indegree 1. A hybrid vertex (or 
recombination vertex or reticulation node) is a vertex of indegree at least 2. 

Let X denote a finite set. Typically in phylogeny, X is a collection of species. 
An X -network N = {V, A, r, X) is a digraph {V, A) with root r such that 

(1) there is a one-to-one map (p : X —^ V such that the image of (j) is the set of 
all leaves of {V, A), and 

(2) for every v G V there is a leaf u and a directed path from v to u. 

Thus the set of leaves of N may be identified with the set X; every vertex is 
ancestral to a leaf. 

In biology most AT-networks are acyclic. The set X provides a context for N, 
giving a hypothesized relationship among the members of X. For convenience, 
we will write x for the leaf (f>{x). 

An X-tree is an A-network such that the underlying digraph is a rooted 
tree. 

If A^ = {V,A,r,X) is an A-network and v ^ V, the cluster of v, denoted 
cl{v), is {x G X : V < x}. We say that N is successively cluster- distinct provided 
that, whenever (u, w) is an arc, then cl{u) ^ cl{v). 

Let N = {V,A,r,X) and N' = {V',A',r',X) be A-networks. An X- 
isomorphism -ip : N ^ N' is a map ip : V ^ V such that 

(1) ip : V —^^ V is one-to-one and onto, 

(2) V'(r) = r', 

(3) for each x £ A, ip{x) — x, 

(4) {ip{u),ip{v)) is an arc of N' iff (u, v) is an arc of A^. 

We say A^ and N' are isomorphic if there is an A-isomorphism ip : N -^ N' . 

A graph (or, for emphasis, an undirected graph) (V, E) consists of a finite 
set V of vertices and a finite set E of edges, each consisting of a subset {wi, V2} 
where vi and W2 are two distinct members of V. Thus an edge has no direction, 
while an arc has a direction. If G = (V, E) is a graph and W^ is a subset of V, 
the induced subgraph G[W] is the graph (VF,_E[VF]) where the edge set _E[VF] is 
the collection of aU {wi,W2} in E such that vi e W and V2 G W. Thus G[W] 
contains all edges both of whose endpoints are in W. 

A graph G = {V, E) is connected if, given any two distinct v and w in 
V there exists a sequence v — vq,vi,V2, ■ • ■ ,Vk — w of vertices such that for 
i = 0, • • • , fc — 1, {wi, Wi+i} <E E. A subset VF of T^ is connected if the induced 



subgraph G[W] is connected. 

Given a digraph G = iV,A) define Und{G) = iV,E) where E — {{u,v} : 
there is an arc {u,v) G A}. Then Und{G) is an undirected graph with the 
same vertex set as G and with edges obtained by ignoring the directions of arcs. 
A subset W oi V is connected if Und{G)[W] is connected. Thus a connected 
subset of G is defined ignoring the directions of arcs. 

Let N = {V, A, r, X) and N' == (V, A', r', X) be X-networks whose leaf sets 
are identified with the same set X. An X- digraph map f : N ^ N' is a map 
f -.V ^V such that 

(a) /(r) = r', 

(b) for all X E X, f{x) ~ x, and 

(c) if (u, v) is an arc of A^, then cither f{u) = /(f) or else (/(u), f{v)) is an arc 
ofA^'. 

Call / connected if for each v' G V , f^^{v') is a connected subset of A, 
i.e., if the induced subgraph Ind{N)[f~^{v')] is connected. Call / surjective if 
for each v' G V, f~^{v') is nonempty and for each arc (a, 6) of A^' there exist 
vertices u and w of A such that {u, v) is an arc of A, f{u) = a, and f{v) = b. 
The kernel of / is the partition {{.r^iv')} : v' G V'} of V. 

We are interested primarily in A-digraph maps that are both connected and 
surjective. They will be called connected surjective digraph maps or CSD maps. 
Many of their properties are analogous to properties of homomorphisms [13] but 
properties involving the leaf set X and connectivity require special attention. 

The following basic results are in the paper [23] . 

Let A — (y, A, r, X) be an A-network. If ^ is an equivalence relation on V , 
denote by [v\ the equivalence class of the vertex v. An equivalence relation ~ 
on V is called leaf-preserving provided that for every x G X whenever u G [x] 
and (m, v) is an arc, then w G [x]. 

Let N = (V, A, r, X) be an A-network. Suppose ~ is an equivalence relation 
on V. Let V — {[v] : v E V} be the partition of V into equivalence classes. 
Define the quotient digraph N' by A^' = {V' ,A' ,r' ,X) where 
(i) V is the set of equivalence classes [v] . 

(ii)^' = W. 

(iii) The member a; G A corresponds to [x]; i.e., the identification is given by 

<^':X^r by</>'(x) = [0(x)]. 

(iv) Let [u] and [v] be two equivalence classes. There is an arc {[u], [v]) G A' iff 

[u] ^ \v\ and there exists u' G [u] and v' G [w] such that (u',w') G A! . 

Alternative notations for A^' will be X j ^ or X jV . 

Theorem 2.1. Let N — {V,A,r,X) be an X-network. Suppose ^ is a leaf- 
preserving equivalence relation on V . Let N' — N/ ^ — {V' , A' ,r' ,X) be the 
quotient digraph. Then 

(1) N' is an X-network. 

(2) The natural map (p : N ^>- N' given by 4){u) = [u] is a surjective X-digraph 
map with kernel the set of equivalence classes under ^. 

(3) If each equivalence class [u] is connected in N , then (j) is connected. 

Theorem 2.2. Let N = {V,A,r,X) and N' = {V',A',r',X) be X -networks. 



Suppose f : N ^f N' is a surjective X-digraph map. Define the relation ^ on 
V by u ^ V iff f{u) — ,f{v). Then r^ is a leaf-preserving equivalence relation 
and the equivalence classes are [u] — f^^{f{u)). Moreover the quotient digraph 
N/ ^ is isomorphic with N' via the map (j) : N/ ^~^ N' given by (/'([u]) — f{u). 

Theorem 2.3. Let N and N' be X -networks. Let f : N ^ N' and g : N' ^ N" 
be X-digraph maps. 

(a) The composition g o f : N ^i- N" is an X-digraph map. 

(b) If f and g are surjective, then g o f is surjective. 

(c) If f and g are connected and surjective, then go f is connected and surjective. 

Suppose N = {V, A, r, X) is an X-network. A partition Q of y is subordinate 
to a partition V oi V provided, for each A G Q, there exists B & V such that 
ACB. 

Theorem 2.4. Let N = (y,A,r,X) and N' = {V',A',r',X) be X-networks. 
Let f : N ^ N' be a surjective X-digraph map with kernel V = {/^^(w) '■ v G 
V'}. Suppose Q is a partition of V that is subordinate to V. 

(1) There exist surjective X-digraph maps g : N ^f N/Q and h : N/Q — > N' 
such that f = ho g. 

(2) If in addition f is connected and each member of Q is connected, then both 
h and g are connected. 

Let N = {V,A,r,X) and N' ^ {V',A',r',X) be X-networks. Suppose 
f : N -^ N' is a surjective digraph map. A wired lift of A'^' is a subgraph 
M = {W, E) of Und{N) such that the following hold: 

(1) For each arc (u', v') of A^' there is exactly one arc (u, v) of A^ such f{u) ~ u' , 
f{v) = v', and {u,v} is an edge of M. The set of all edges {u,v} so obtained 
will be denoted Ei and the set of all vertices which occur in any of the arcs 
(u, v) e El will be denoted V{. Let Vi = V/ U AT. 

(2) Every edge {a, b} € E either lies in Ei or else satisfies /(a) ~ f(b). 

(3) For each vertex u' of A^', let V{v') = {w G Vi : f{w) = v'}. The induced 
subgraph M[f^^{u') n T4^] is a tree with leafset V{v'). 

We call El the set of nondegenerate edges of M, since the image under / of 
each such edge is an edge of A^', not just a single vertex. Note that W C V and 
E C E{Und{N)). 

Intuitively, M is a subgraph of Und{N) that is a resolution of Und{N') in 
that for each vertex v' of A^', [f~^{v')] n W consists of the vertices of a tree, 
all of whose vertices map to v' , not necessarily a single point. The name "lift" 
suggests that A^' is being lifted into the domain of /. 

The following theorem gives sufficient conditions for a wired lift to exist 
given any choice of Ei. The essential property is that / be connected. 

Theorem 2.5. Let N ^ {V,A,r,X) and N' ^ (V',A',r',X) be X-networks. 
Suppose f : N ^ N' is a CSD map. For each arc (u',v') of N' choose an arc 
(u, v) of N such that 4'{u) = v! , (j)(v) ~ v' . Let Ei denote the set of edges {u, v} 
of Und{N) so obtained. Then f has a wired lift M for which Ei is the set of 
nondegenerate edges. Each such wired lift M is a resolution of Und{N'). 



3 Restricted sets 

Let N = (V, A, r, X) be a rooted acyclic X-network. We seek natural methods 
to assign standard networks of various sorts to N. For example, even if N has 
many hybridization events, we might be able to assign some standard tree that 
might correspond to some consensus species tree. 

This section proposes one such construction, which will be denoted ResTr{N). 
An example is given in Section 4. ResTr(N) will have the form N/ ~ for a cer- 
tain equivalence relation ^ on the vertices of N. Because of the construction, 
there will be a CSD map f : N ^ ResTr{N). Consequently, by Theorem 2.5, 
ResTr{N) will have a wired lift into N . 

In this section we shall assume that N = (V, A, r, X) is a rooted acyclic 
network with leaf set X. We shall sometimes assume that every leaf is tree- 
child (with indegree 1). 

The construction involves identifying subsets of V here called "restricted 
subsets." 

A set B of vertices is called closed if, whenever 6i and 62 are in B and 
61 < 62, then every vertex v such that 61 < w < 62 also lies in B. 

A nonempty set B of vertices not containing r has restricted entry or is 
restricted if there exists a unique vertex w' such that 

(1) w' i B, 

(2) for some b G B there is an arc {w' , b), 

(3) whenever (w, b) is an arc, w ^ B, b € B, then w — w' . 

We call this unique vertex w' the anchor of B and write Anc{B) — w' . A set 
B of vertices containing r has restricted entry or is restricted if there is no arc 
{w, b) with b £ B and w ^ B. 

Lemma 3.1. A restricted set B is closed. 

Proof. Suppose first that B does not contain r. Suppose &i < w < 62 with 
bi € B , b2 € B , V ^ B. We may assume {v, 62) is an arc, whence v = Anc{B). 
But since r is the root, there is a directed path P from r to 61; since r ^ B and 
61 G B, Anc{B) lies on P. It follows Anc{B) < bi < Anc{B), so that N has a 
directed cycle, contradicting that N is acyclic. 

To see that B is closed if B contains r, suppose bi < v < b2 with 61 G B, 
b2 & B, V ^ B. We may assume (w,&2) is an arc, contradicting that B is 
restricted. D 

Lemma 3.2. Let N = (V, A,r, X) be an acyclic X -network. Suppose B is 
restricted and r ^ B. For every b E B there is a directed path from Anc(B) to 
b such that all vertices on the path except Anc{B) itself lie in B. 

Proof. Choose a path from r to b, say r = uq, mi, ■ ■ ■ , Uk = b. Since r ^ B and 
Uk G -B, there exists i such that tii ^ B, u^+i G B. Since B has an anchor, it 
follows Ui — Anc(B). Since Ui+i G B and Uk G B, every vertex on the path 
from Mi+i to Uk lies in B because B is closed by Lemma 3.1. D 



Theorem 3.3. Let N = {y,A,r,X) he an acyclic X -network. Suppose B and 
C are restricted subsets and B nC is nonempty. Then B U C is restricted. 

Proof. Assume w € B CiC. We prove the result via three cases. 

Case 1. Suppose r is in neither B nor C. Then both B and C have anchors. 
I claim first that either Anc{B) e C or Anc{C) e i? or Anc{B) == Anc{C). 
To see this, suppose Anc{B) ^ C. Since w £ B by Lemma 3.2 there is a 
directed path P from Anc{B) to w such that all vertices after the first lie in 
B. Since Anc{B) ^ C, there is a vertex v on the path P which is not in C but 
whose child on the path lies in C. Hence v — Anc{C). It follows that either 
Anc{C) = Anc{B) or else Anc{C) € B. This proves the claim. 

Now there are three subcases: 
Subcase (la). Suppose Anc{B) e C. 

To show that B U C is restricted, since r ^ B U C\ it suffices to show that 
Anc{C') is an anchor for B (J C. To sec this, suppose (u, d) is an arc with 
d e BUC mid u (^ BUC. li d e C, then u = Anc{C). li d e B, then 
u — Anc(B), but this implies u E C, contradicting that u ^ BUC; so this latter 
case cannot occur. 

Subcase (lb) Suppose Anc{C) e B. Then Anc{B) is an anchor for B U C 
and B U C is restricted by arguments like those in subcase (la). 

Subcase (Ic) Suppose Anc{B) = Anc(C). I claim Anc{B) is an anchor for 
B U C. To see this, suppose (u, d) is an arc with d £ B U C and u ^ B U C. If 
d€ B then u = Anc{B). li d € C then u = Anc{C) = Anc{B). 

Hence the result is true in Case 1. 

Case 2. Suppose B has an anchor but r G C. I claim B L) C is restricted. 
Since r G B U C, we suppose (u, d) is an arc with u ^ B U C but d E B Li C, 
and we derive a contradiction. Since C is restricted and contains r, it follows 
d ^ C. Hence d G B and u — Anc{B). 

Since B has an anchor and w E B, by Lemma 3.2 there is a path from 
Anc{B) to w such that all vertices after the first lie in B. Since r is the root, we 
obtain a path from r to Anc{B) and then to w. Since w E C and C is closed by 
Lemma 3.1, it follows Anc{B) E C. This contradicts that Anc{B) — u ^ BUC. 
Hence the result is true in Case 2. 

Case 3. Suppose r E B and r E C. I claim B U C is restricted. Since 
r E B \J C wc suppose {u, d) is an arc with u ^ B U C but d E B U C, and we 
derive a contradiction. Note that we cannot have d E B since B is restricted, 
and we cannot have d E C since C is restricted. Hence the situation is not 
possible. n 

Another way to combine restricted sets into a new restricted set is given in 
the next result: 

Lemma 3.4. Suppose B and C are restricted sets and there is an arc (b, c) with 
b E B and c E C . Then B (J C is restricted. 

Proof. If B and C intersect, then the result follows from Theorem 3.3. So we 
may assume that B and C are disjoint. Since b ^ C and C is restricted, it 



follows that b = Anc{C). Now suppose that {u,v) is an arc with v € B U C 
and u ^ B U C. li v E C, then u = Anc{C), so u E B, a, contradiction. Hence 
V € B, so r ^ B and u = Anc{B). Since u is uniquely determined, B U C is 
restricted. D 

Now, whenever f G V^, we construct an interesting restricted set denoted 
R{v). It will turn out that R{v) is the smallest restricted set that contains v. 
The basic construction is the following: 

Algorithm Smallest restricted set 

Input. An acyclic A-network N = {V,A,r,X) and v eV. 

Output. A subset R{v) of V. 

Procedure: Define a sequence of sets Ri of vertices as follows: 

(1) Let Ro = {v}. 

(2) Recursively, given Ri perform the following: Suppose there exist u ^ Ri and 
w £ Ri with arc {u, w). Let i?i+i := Ri U {u} if either of the following holds: 

(a) there exists w' G Ri such that u ^ w'; 

(b) there exist u' E V ~ Ri, v' G Ri, u' ^ u, and arc (u', v') such that u ^ u' . 

(3) Iterate the procedure until for some m, Rm has been constructed and there 
are no further changes possible according to (2). Define R{v) = Rm- 

An example of the algorithm is given in section 4. 

In step (2), if there are two vertices u and u' not in Ri , u ^ u' , and arcs 
{u, v'), (u' , v") with v' and v" in Ri, then we cannot have both u < u' and u' < u 
since that would force u ~ u' . Hence at least one of u and u' will be adjoined 
to Ri. It is possible that both u and u' will be adjoined to Ri in separate steps. 

It is easy to see that R{v) is well-defined. This assertion means that when 
the algorithm terminates, the result R{v) is independent of the order in which 
the operations were carried out as long as they were legitimate when performed. 

To see this, suppose at a certain time we have Ui, U2 not in Ri, vi G Ri, 
V2 'E Ri, ui ^ U2, and arcs (ui, vi), {u2, V2). If there exists w E Ri and ui ^ w, 
we could adjoin ui. Alternatively if we are able to adjoin U2 first and then 
consider ui, it is still true that w € Ri and ui ^ w, so we can still adjoin 
ui. Another possible scenario is that ui ^ U2 and U2 ^ wi, so either could be 
adjoined first. Then we may adjoin ui and at a later stage U2 still meets the 
criterion for adjoining U2 since now w' — ui applies for (2a). 

Note that if u has indcgrce 1, then R{u) = {u} since no operation of type 
(2) can be carried out. 

Theorem 3.5. Let N = {V, A, r, X) be an acyclic X -network. For each v E V, 
R{v) is restricted. 

Proof. Suppose first that r G R(v). We must show that there is no vertex w, 
w ^ R{v), such that there is an arc {w, b) with b G R{v). Otherwise, if such w 
exists, then there is a path from r to w then to b with w ^ R{v). Note that 
R{v) ~ Rm for some m. Moreover w ■^ r since this can happen only when 



w = r and w ^ Rm but r 6 Rm- Hence step (2a) could be used to define 
Rm+i '■— Rm U {w}, contrary to the assumption tliat no more operations of 
type (2) can be performed. 

Now suppose that r ^ R{v). We show that R{v) has an anchor. Since 
r ^ R(y) there is a path from r to some member b G R(y) and a vertex w on the 
path which is not in R{v) but such that the next vertex on the path hes in R{v). 
This proves there exists w ^ R{v) and an arc {w, b) with b e R{v). To have an 
anchor, this vertex w must be unique, in which case R{v) is restricted. Suppose 
there were two vertices wi and W2 with arcs (wi,6i) and {w2,b2), wi ^ W2, 
wi ^ -R(f), W2 ^ -R(w), 6i G R{v), 62 S i?(w). Note that R{v) = Rm for some 
m. If wi ^ W2 then step (2b) could be used to enlarge Rm by adjoining wi, 
and similarly if W2 ^ wi then Rm could be enlarged by adjoining W2- Hence 
Wi < W2 and W2 < wi, implying wi = W2. This proves that the vertex w is 
unique, so R{v) is restricted. D 

The sets R{v) have other nice properties. The next result shows that R{v) 
is the smallest restricted set that contains v. 

Theorem 3.6. Let N = (V, A,r, X) be an acyclic X -network. Suppose B is a 
restricted set and w G B. Then R(w) C B. 

Proof. Let the sequence Ri be used to compute R{w). Initially i?o = {w} C B. 
The proof will be by induction. We will assume Ri C B but the algorithm does 
not terminate with Ri. We will prove -R^+i C B. It is immediate that _Ro ^ B. 
Note that i?i+i arises from Ri. Hence there exist u ^ Ri, v €^ Ri, and arc 
(u,v) such that u is adjoined to Ri in one of two ways. We must show that 

ue B. 

Suppose (2a) applies. Hence there exists w' £ Ri such that w ^ w'; we show 
u £ B. If not, then since v £ B and B is restricted, it follows u — Anc{B). 
Since w' € Ri, we have w' € B since Ri Q B, whence by Lemma 3.2, u < w' , a 
contradiction. Hence u e _B so Ri+i C B. 

Suppose instead (2b) applies. Hence there exist u' ^ Ri, u ^ u', v' £ Ri, 
and arc {u',v'), such that u ^ u'. We show u E B. If not, then u ~ Anc{B) 
since v G B. We cannot have u' ^ B, since then u' = Anc{B) — u. Hence 
u' G B, whence by Lemma 3.2, u = Anc{B) < u' , a contradiction. This proves 
ue B so R,+i C B. n 

Corollary 3.7. If u e R{v), then R{u) C R{v). 

Proof. By Theorem 3.4, R{v) is restricted. The result follows now from Theorem 
3.6. D 

In fact, given any subset B oi V, the algorithm computes the smallest re- 
stricted set that contains B provided that we use Rq = B. 

In general it need not be the case that a restricted set B is connected. For 
example, suppose that N is an X-tvee and the leaves x and y form a cherry, 
so there are a vertex u and arcs {u,x), (u,y) with no other arcs into x or y. 
Then {x, y} is restricted with anchor u but is not connected. Consequently, the 
following result that each set R{v) is connected is of interest. 
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Lemma 3.8. For v E V, if w (L R{v), then there is a directed path in R{v) 
from w to V. Moreover, R{v) is connected. 

Proof. We show the properties for each Ri used to define R{v). Initially Ro = 
{v} and the properties are immediate. Each operation (2a) or (2b) applied to 
Ri results in a connected set -Ri+i and adds a vertex with a path to v inside 
Ri+i- □ 

Let Q ~ {R{v) : v e V}. Note that Q does not need to be a partition of F, 
but each v eV lies in at least one member of Q. 

We now make some modifications of Q to create a partition V of V . Roughly 
we merge together members of Q that have nonempty intersection until no more 
such merges can be performed. 

More precisely, if R{v) and R{w) are in Q, define R{v) ^ R{w) if R{v) fl 
R{w) ^ 0. Define R{v) ~ R{w) iff there exist v — vq, vi, ■ ■ ■ , vi^ — w such that 
for i = 0, • • • , fc, R{vi) e Q and for « = 0, ■ • • , fc — 1, R{vi) ^ R{vi^i). Then « 
is an equivalence relation. Define R'{v) — Ll{R{w) : R{v) « R{w)}, so R'{v) is 
the union of sets equivalent to R{v). Let V — {R'{v)} be the set of distinct sets 
R'{v). It is clear that P is a partition of V. For v E V, R'{v) is the member of 
V containing v. 

Lemma 3.9. Each set R'{v) is a restricted subset of V and is connected. 

Proof. The fact that R'{v) is restricted follows from Theorems 3.3 and 3.5 by 
an obvious induction. That R'{v) is connected follow from a similar induction, 
also using Lemma 3.8. D 

Let ResTr{N) = N/V be the quotient X-network. The map (j) : N ^ 
ResTr{N) given by 4'{v) = R'{v) will be called the natural projection map. 

Theorem 3.10. Suppose N ~ (V, A,r, X) is a rooted acyclic network with leaf 
set X such that every leaf has indegree 1. Then ResTr{N) is an X -network. 
The natural projection map (j) : N ^>- ResTr{N) is a CSD map. 

Proof. For x E X, since x has indegree 1, it follows R(x) — {x}. If v is not a 
leaf, then each w G R(v) satisfies w < v hy Lemma 3.8; it follows that a leaf x 
cannot lie in R{v) when v is not a leaf. Hence R'{x) = {x}. By Theorem 2.1, 
it follows that ResTr{N) is a rooted digraph with leaf set X. By Theorem 2.1 
and Lemma 3.9, the natural projection map <f> : N -^ N' is a, CSD map. D 

It will turn out (Theorem 4.1) that ResTr{N) is an X-tree, and we will call 
it the (standard) restricted tree of N. The corresponding kernel V will be called 
the restricted tree kernel. 



4 Restricted maps 

A CSD map f : N ^ N' with kernel Q is restricted if each member of Q is 
restricted. Equivalently, / is restricted if for each vertex v' of TV', f^^{v') is a 
restricted set. 
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The natural projection map (p : N —^ ResTr{N) is a restricted map since 
each member of the kernel 7-" is a restricted set. 

Suppose a network N is successively cluster-distinct. Then a restricted set 
i? is a natural generalization of a taxon unit in a tree. Each restricted set B 
corresponds to a connected collection of taxa all deriving from the single taxon 
Anc{B). If A^ is a tree, then each vertex is already restricted; the image of 
a restricted map thus generalizes the notion of a tree. Note that [10] argues 
that the extant human population forms a tight cluster. If N is successively 
cluster-distinct, the same argument would suggest that it forms a restricted set. 

A restricted CSD map / : A^ — > iV' is universal (for restricted maps) provided 
that given any restricted map g : N -^ N" there is a unique restricted CSD 
map h : N' ^ N" such that g = ho f. 

We shall see below that the natural projection map (f> : N -^ ResTr(N) is 
universal for restricted maps. 

The first result is that the image of a restricted map is always a tree. 

Theorem 4.1. Let N = {V,A,r,X) be an acyclic X -network and let T = 
{V , A' ,r' ,X) be an X -network. Assume f : N ^ T is a restricted CSD map. 
Then T is a tree. 

Proof. We show that T has no hybrid vertices. Suppose otherwise, so we may as- 
sume V' contains distinct vertices u[, 1*2, and Ug while A' contains arcs {u'l, u'^), 
(^2,^3). Let Bi — f^^{u'i). Since / is restricted, each Bi is a restricted set. 
Since / is a CSD map, there exist ui € Si, wi € -B3, U2 6^2, and W2 S B3 
such that {ui,wi) and (^2,^2) are arcs of N. Note m ^ B3 and U2 ^ B^. Since 
i?3 is restricted, it follows ui = Anc(B3) and U2 = Anc(B3). Hence ui = 1*2 so 
u[ ~ /(wi) ~ f{^2) = u'2, a contradiction. D 

Corollary 4.2. Let N = {V, A, r, X) be an acyclic X -network. Then ResTr{N) 
is an X-tree. 

Proof. The natural projection map (p : N -^ ResTr{N) is a restricted CSD 
map. n 

The next result shows that many relationships among leaves observed in 

ResTr{N) are also present in N. 

Corollary 4.3. Let N = {V, A, r, X) be an acyclic X -network. There is a wired 
lift of ResTr{N) into N. 

Proof. This follows from Theorem 2.5. D 

Restricted maps have interesting functorial properties, as seen in the next 
results. 

Lemma 4.4. Suppose N = (y,A, r, X) and N' = {V',A',r',X) are X -networks. 
Suppose f : N ^)- N' is restricted. If B C V' is restricted and connected, then 
f^^{B) is restricted. 
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Proof. Let B — {w[, W2, ■ ■ ■ , w^} C V'. Since B is connected, for some p there 
exist p arcs {w[_^ , w^j ), • • • , {w[ , w^- ) such the arcs connect the members of B. 
Now each set f~^{w'i) is restricted. Since / is a CSD map for each k there is an 
arc (wij^jWjj^) with w^^ G f^^{w^ ) and Wj^. G f^^{w'j )• The resuh now follows 
from Lemma 3.4. D 

Theorem 4.5. Suppose N = {V,A,r,X), N' = (y',A',r',X), anrf N" = 
iy", A", r", X) are X -networks and f : N ^ N' and g : N' ^ N" are restricted 
CSD maps. Then the composition g o f : N ^)- N" is restricted. 

Proof. Suppose v" G V" . We must show that (g o /)"H"") = ./"Hff^H"")) is 
restricted. Since g is a restricted CSD map, g~^{v") is restricted and connected. 
Since / is restricted, f~^ig~^{v")) is also restricted by Lemma 4.4. D 

We can now prove the universality property of ResTr(N). 

Theorem 4.6. Let (f> : N ^>- ResTr{N) he the natural projection map. Then (j) 
is universal for restricted maps. 

Proof. Let g : N -^ T he a restricted map with kernel Q. Let P denote the 
kernel of (j). Note that each member B Cz P is restricted and each member 
C G Q is restricted. 

We use Theorem 2.4 to define a CSD map h : ResTr{N) -^ T such that 
g = ho(j>. We first show that P is subordinate to Q. Let B eP. We must show 
that there exists a member C G Q such that B C C. 

For any vertex w of iV there exists C{v) G Q such that v G C{v) since Q 
is a partition. Since C{v) is restricted, R{v) C C{v) by Theorem 3.4. If R{v') 
intersects R{v) then C{v) intersects C(w'), whence because Q is a partition it 
follows C{v) = C{v'); hence R{v) U R{v') C C{v). A simple induction then 
shows that the member B E P that contains v satisfies B C C{v). This shows 
that P is subordinate to Q. D 

For an example, consider the network N shown in Figure 1. We demonstrate 
the construction of ResTr{N), also shown in Figure 1. Let u be a vertex of N. 
If f ^ {14,16}, then R{v) — {v} since v has indegree 1. To compute i?(16), 
initially Rq = {16}. Since (15,16) and (17,16) are arcs and 15 ^ 17, we can 
add 15 to Ro by (2b) yielding i?i = {15, 16}. Since 17 ^ 15 we can add 17 to 
i?i by (2a), yielding R2 ^ {15,16,17}. Since (12,15) is an arc and 12 ^ 17, 
we can adjoin 12 by (2a), so R3 — {12, 15, 16, 17}. Since (18, 17) is an arc and 
18 ^ 12, we can adjoin 18 by (2a), so R4 = {12, 15, 16, 17, 18}. Now the only 
arcs (u, v) with w G -R4 and u ^ R^^ arc (11, 12) and (11, 18), so we cannot adjoin 
11 using (2b). For all w G i?4, 11 < w so we cannot adjoin 11 by (2a). Hence 
the algorithm terminates with _R(16) — R4 = {12, 15, 16, 17, 18}. 

Similarly i?(14) = {13,14,15}. Since R{16) n i?(14) == {15} is nonempty, 
i?'(14) = i?'(16) = i?(14) U i?(16) == {12,13,14,15,16,17,18}. For ah v i 
i?'(16), R!{v) = R{v) = {v\. Now ResTr{N) is the quotient digraph. 

As promised, ResTr{N) is a tree. Note that the resolution of the cluster 
{3,4} in N is lost while that of {5,6} is preserved; this is because the hybrid 
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ResTr{N) 10 




6 5 

Figure 1: An X-nctwork N with X = {1, 2, 3, 4, 5, 6, 7, 8, 9} and ResTr{N). 



vertex 16 had outdegree 1 while the hybrid vertex 14 had outdegree 2. The 
natural projection map cf) takes 4>{v) — v except that for v G i?'(16), 4'{v) = 

i?'(16). 

To illustrate the universality of the map in this example, consider the map 
f : N ^ T where Figure 2 shows N and T in which the vertices of N have been 
labelled by the vertices of T in order to display the map /. One checks that 
/ is restricted. For example, f^^(a) is the set of vertices in A^ labelled a and 
is restricted. Then / factors as f = g o cj) where g : ResTr{N) — )■ T satisfies 
that g{R'{m)) = a = g{l9), g{ll) = 10, 5(20) = b, and for other vertices v of 
ResTr{N), g{v) == v. 

It is interesting that the vertex 11 in ResTr{N) cannot be removed from 
ResTr{N) by contracting the arc (ll,i?'(16)) and still retain universality. In 
the example of Figure 2, both 11 and 10 in ResTr(N) are mapped to 10 in 
T. But a simple modification could yield an example in which 11 and 10 in 
ResTr(N) must go to distinct vertices of the modified T. 

The network ResTr{N) detects narrow bottlenecks in A^. Perhaps it is most 
appropriate to apply to ClDis{N) (see [23]) rather than to A^ itself, since large 
regions in A^ of vertices all with the same cluster can become bottlenecks in 
ClDis{N). 
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